Building a Travel Demand Model for the Oklahoma City Metropolitan Statistical Area

Introduction + Demographics

The Oklahoma City (OKC) metropolitan statistical area (MSA) is composed of seven counties centrally located in Oklahoma: Canadian County, Cleveland County, Grady County, Lincoln County, Logan County, McClain County, and Oklahoma County. Across these counties there are a total of 363 census tracts (based on the 2010 census tracts). The MSA has a total land area of approximately 1,427,500 square kilometers. Together, these counties are home to 1,382,800 people, according to the 2019 ACS 5-Year Estimates. The Oklahoma City MSA makes up about 35% of the state’s total population of 3,932,900.

The Oklahoma City MSA is predominantly white, with 64.3% of residents identifying as white alone, followed by 13.3% Hispanic or Latino, 10.1% Black, 3.3% American Indian and Alaska Native, 3.1% Asian, 0.1% Native Hawaiian and Pacific Islander. 36.8% of residents in the study are are ages 35-64, 24.7% are under the age of 18, 25.0% are ages 18-34, and 13.6% are ages 65 and older. The median household income in 2021 was $59,084 (Figure @ref(fig:income-distribution)) – higher than the statewide average and lower than the national average. About 13.9% of residents live below the poverty line (Figure @ref(fig:poverty-line)) – lower than the statewide average and higher than the national average.

The Median Household Income in the Oklahoma City MSA was $59,084 in 2019

The Median Household Income in the Oklahoma City MSA was $59,084 in 2019

Households Living Below the Federal Poverty Line are Concentrated Near Downtown Oklahoma City

Households Living Below the Federal Poverty Line are Concentrated Near Downtown Oklahoma City

Transportation in OKC

EMBARK, the area’s transit authority, operates all public transit in greater Oklahoma City, which includes fixed-route bus service, the OKC Streetcar, paratransit service, river ferry transit, and a bikeshare network. Beyond transit, car-ownership and use is prevalent in the Oklahoma City MSA with only 2.6% of households reporting that they do not have access to a vehicle (Figure @ref(fig:veh-ownership)), and a median number of vehicles per household of 2.3. According to the 2019 ACS 5-Year Estimates, cars, trucks, or vans are the most common means of transportation to work for workers 16 and over (used by 92.6% of respondents), while commuting by public transportation is far less common (used by 0.5% of respondents). Based on the same ACS data, the average travel time to work is 23 minutes.

Most Households in the Oklahoma City MSA Own at least One Vehicle

Most Households in the Oklahoma City MSA Own at least One Vehicle

Road Network

The Oklahoma City MSA contains a road network that has developed akin to many cities in the United States. Downtown OKC, which is the oldest part of the MSA, largely follows a grid pattern. As the network moves towards the periphery this street network design changes. The network becomes much more sparse and transitions to a less uniform curvilinear network.

Road Network Summary Statistics

Road classifications are determined by Open Street Maps, from which we extracted our road network.

  • Primary roads: 438.3 miles
  • Secondary roads: 982.2 miles
  • Tertiary roads: 1550.4 miles
  • Trunk roads: 299.0 miles
  • Links/Centroid Connectors: 3306.2 miles

Our Model

The Components of our Model:Roads, Centroids, Centroid Connectors, and Transit Lines

The Components of our Model:Roads, Centroids, Centroid Connectors, and Transit Lines

To create the network in TransCAD (Figure @ref(fig:geography)), we imported the Open Street Maps layer and selected for primary, secondary, tertiary, and trunk roads. When creating the centroids and centroid connectors, we allowed for the connectors to extend outside of zone boundaries, at up to 30 miles, and up to 10 connectors per centroid. This resulted in a matrix with only a few gaps, which we resolved by turning surrounding roads two-way. (The major roads in the area were separated highways, so we trust that two-way driving is actually feasible in these zones.)

Assumptions that were incorporated into our road network include:

  • Vehicle speed on primary roads is: 60 mph
  • Vehicle speed on secondary roads is: 40 mph
  • Vehicle speed on tertiary roads is: 30 mph
  • Vehicle speed on trunk roads is: 60 mph
  • Vehicle speed on link/centroid connectors is: 15 mph

Assumptions that were incorporated into our model when we created our transit network include:

  • Everyone walks to transit
  • Transit vehicle speed is 23 mph
  • Walking speed is 2.5 mph
  • There is an assumed 15-minute wait time, max wait time, and transfer time
  • Fares are $2 and there is a $0 transfer fee
  • EMBARK operates a streetcar which was changed in TransCAD to exist as rail

The charts below illustrate how the timing of various transit trips break down across walk, wait, transfer and travel time (Figure @ref(fig:stacked-bar-chart)).

Travel Time Makes Up the Bulk of Most Trips, but Transfer Wait Time is Also A Large Portion of Trip Duration

Travel Time Makes Up the Bulk of Most Trips, but Transfer Wait Time is Also A Large Portion of Trip Duration

Travel Time Makes Up the Bulk of Most Trips, but Transfer Wait Time is Also A Large Portion of Trip Duration

Travel Time Makes Up the Bulk of Most Trips, but Transfer Wait Time is Also A Large Portion of Trip Duration

Land Use: Employment & Population Density

Major employment sectors in the Oklahoma City metro area include government, higher education, aerospace, healthcare, and retail. According to the Greater Oklahoma City Chamber of Commerce, major employers include the State of Oklahoma, Tinker Air Force Base, the University of Oklahoma, Integris Health, and Amazon. In 2021, the metropolitan area added 10,825 jobs (1.7% increase), and further job growth was expected in 2022 as stated in the 2022 Greater Oklahoma City Economic Outlook.

Assumptions inherent in our model related to employment and population include: * Everyone lives at centroid of their zone * Retail jobs represent both employment and attractions * We assume that employment exists in three categories (Figure @ref(fig:employment-county)): + Retail + Service + Basic * We also include total employment as a measure of all three categories

Oklahoma County has the Most Employees in the MSA, and the Majority Are Service Sector Workers

Oklahoma County has the Most Employees in the MSA, and the Majority Are Service Sector Workers

OKC generally follows a monocentric pattern of development with the greatest density of people living and working in downtown Oklahoma City (Figure @ref(fig:emp-density), Figure @ref(fig:pop-density)), located in the center of the MSA.

This employment density aligns with heightened public transit connectivity, too. In the urban core of Oklahoma City, we notice an overlap between a central business district, higher employment density, and increased public transit connectivity.

Population Density is Greatest in the Center of the Oklahoma City MSA

Population Density is Greatest in the Center of the Oklahoma City MSA

Job Density is Also Concentrated in the Center of the Oklahoma City MSA

Job Density is Also Concentrated in the Center of the Oklahoma City MSA

Accessiblity

The measurement of accessibility has been suggested as the ultimate benchmark for evaluating the effectiveness of land-use and transportation systems. By gauging accessibility, one can evaluate how much certain demographic groups benefit from a particular land-use and transportation system. This analysis can yield a significant socio-spatial report for urban planners, informing them on how to better plan and implement these systems (El-Geneidy and Levinson 2022; Wachs and Kumagai 1973).

After calculating and creating our initial visualization of zonal-based accessibility we came to understand that the number of zones one can reach is less relevant than the number of opportunities one can reach. Our definition of accessibility also changed to reflect this understanding. We turn to the definition provided by Carole, accessibility: The ease with which a person can fully participate in the economic, civic, and social life of their society. In the case of our model we have chosen to limit accessibility to access to job opportunities. Our model still holds assumptions, for example, zones are assigned value based on what the jobs they contain. In this way we employ a gravity-based accessibility measure that considers the attractiveness and distance of destinations within a particular area. This measure considers that the more attractive a destination is and the closer it is to a particular location, the more accessible it is to people living or working in that area.

We have decided to keep measures of car and transit accessibility separate, rather than create a combined metric of the two (Figure @ref(fig:access-indices), Figure @ref(fig:access-histogram)). We are more interested in comparing access across the two different modes than conceptualizing one measure of both car and transit accessibility. However, we did weight transit travel time to account for perceptions that out-of-vehicle travel time on public transportation trips is more onerous than in-vehicle travel time, multiplying OVTT by 1.5.

The Transit Hub South of Downtown Oklahoma City is A Recognizable Blip on this Plot Showing Car and Transit Accessibility Plotted Against One Another

The Transit Hub South of Downtown Oklahoma City is A Recognizable Blip on this Plot Showing Car and Transit Accessibility Plotted Against One Another

Nearly Two-Thirds of Zones Do Not Have Any Transit Accessibility

Nearly Two-Thirds of Zones Do Not Have Any Transit Accessibility

For our models, we selected the exponential function for car travel and the logistic function for transit travel (Figure @ref(fig:spatial-access-car), Figure @ref(fig:spatial-access-transit)). Our rationale is that the exponential function captures the long-reaching value of a private vehicle in achieving accessibility, as the measure of accessibility never quite reaches 0, no matter how long the travel time. The benefit of a logistic function for transit, however, reflects the perception of great access within short trips on a transit network. For both functions, we set a half-life/inflection point of 30 minutes, which aligns with our personal conceptions of reasonable work- and personal-trip distances.

Accessibility by Car is Strongest in Downtown Oklahoma City, with Some Zones on the Periphery Indicating Very Low Accessibility

Accessibility by Car is Strongest in Downtown Oklahoma City, with Some Zones on the Periphery Indicating Very Low Accessibility

Accessibility by Transit is Concentrated in the Downtown Area, with Some Accessibility around a Southern Transit Hub in Norman

Accessibility by Transit is Concentrated in the Downtown Area, with Some Accessibility around a Southern Transit Hub in Norman

Trip Generation + Distribution

## 
## HBO HBW NHB 
## 332 106 266

Estimating Trip Productions

To generate trip production estimates for our model, we ran linear regressions using observed trip counts by purpose (home-based other, home-based work, and non-home-based) as the dependent variable, and four household characteristics pulled from American Community Survey data as the predictors. The census variables we used for these regressions differed somewhat from those we collected at an earlier stage in our model-building. Initially, we were interested in understanding whether income, family structure, vehicle ownership, and age had any relationship with travel behavior. Upon reviewing the NHTS data, we realized we needed to adjust some of our tract characteristic variables to better align with the ways variables are structured in NHTS. For income, we changed the variable structure from categorical brackets to a continuous number—the median income for each zone. For family structure, we opted for a binary variable that would note the presence of children in the household, regardless of the number of parents or their gender. For vehicle ownership, we found the variable to be insignificant in the first runs of the models, so we switched it out for measures of homeownership instead.

To support our choice of variables, we looked to the transportation planning literature to understand how others have analyzed the relationships between socioeconomic characteristics and travel behavior. Specifically, we noted that a 2013 paper by Boschmann and Brady found that with increasing age, travelers make fewer and shorter trips, women make fewer and shorter trips than men, and persons with disabilities make the fewest trips compared with all other persons. Equally, Agrawal et. al. found that low-income individuals have distinct travel needs and cost of travel has implications for how individuals travel. Finally, Rosenbloom finds that the presence of children changes travel behavior, more drastically for women than men.

Based on all of this, we chose to employ the following variables in our model:

  • Median income (continuous variable, in thousands of dollars)
  • Presence of children in the household (true or false, not further differentiated by gender or number of parents)
  • Homeownership (true or false)
  • Households with residents age 65 or older (true or false)

While the variables we chose were based on prior evidence outlined above, we recognize their limits in fully describing households and accurately forecasting travel behavior. For example, while both the presence of children and the presence of residents aged 65 and older tell us something about household structure, they are both proxies for more specific characteristics that define travel patterns (school or daycare enrollment in the case of children, or non-participation in the labor force in the case of people 65 and older) which could be more accurately measured and modeled if those specific data were available. Further, we chose to incorporate an indicator of homeownership as a more individualized measure of household wealth to supplement the zone-level median income.

In running the linear regression models, we kept all the variables formatted as listed above except for income, for which we applied a log-base 2 transformation (in thousands) to understand the effect of doubling one’s income rather than increasing it by a single dollar. All three models have R-squared values between 0.11 and 0.2, which means that between 11% and 20% of the variation in trip production can be attributed to the predictive variables we selected. This is in keeping with what we would expect for a model using a limited set of characteristics to predict something as complex as travel behavior. A more accurate prediction would likely require separate models for each trip purpose, using a greater number of independent variables, and many iterations of testing to determine the right combination of these variables that best fits the observed travel behavior. Equally, a more accurate model would rely on more local travel survey data, rather than subsetting the NHTS for our study area, which we have done here for the sake of time and data availability.

Home-Based Other Model

Observations 95 (10 missing obs. deleted)
Dependent variable HBO_trips
Type OLS linear regression
F(4,90) 4.99
0.18
Adj. R² 0.15
Est. S.E. t val. p
(Intercept) -0.39 1.40 -0.28 0.78
log2(inc_k) 0.47 0.24 1.95 0.05
any_kidsTRUE 2.84 0.82 3.45 0.00
homeownershipTRUE 0.25 0.77 0.33 0.74
age_65TRUE 1.48 0.72 2.05 0.04
Standard errors: OLS

Home-Based Work Model

Observations 95 (10 missing obs. deleted)
Dependent variable HBW_trips
Type OLS linear regression
F(4,90) 2.68
0.11
Adj. R² 0.07
Est. S.E. t val. p
(Intercept) 0.33 0.62 0.53 0.60
log2(inc_k) 0.13 0.11 1.18 0.24
any_kidsTRUE 0.24 0.37 0.65 0.52
homeownershipTRUE 0.27 0.34 0.77 0.44
age_65TRUE -0.73 0.32 -2.27 0.03
Standard errors: OLS

Non-Home-Based Model

Observations 95 (10 missing obs. deleted)
Dependent variable NHB_trips
Type OLS linear regression
F(4,90) 5.65
0.20
Adj. R² 0.17
Est. S.E. t val. p
(Intercept) 1.82 1.26 1.44 0.15
log2(inc_k) 0.07 0.22 0.32 0.75
any_kidsTRUE 2.93 0.74 3.94 0.00
homeownershipTRUE 0.28 0.69 0.40 0.69
age_65TRUE -0.88 0.65 -1.35 0.18
Standard errors: OLS

Estimating Trip Attractions

To estimate trip attractions for our model, we relied on the coefficients for “motorized person trips” found in NCHRP 716. Using these values, we calculated attractions for each trip type within each zone, and then balanced them with the productions to ensure an equal number of trips produced and trips attracted in the overall study area.

As expected, productions and attractions are primarily concentrated in the downtown area—attractions even more so than productions (Figure @ref(fig:density-prod), @ref(fig:density-attr)). Both generally follow the density maps of residences and employment that we created earlier in the semester.

Trip Productions are Primarily Concentrated in the Downtown Area

Trip Productions are Primarily Concentrated in the Downtown Area

Trip Attractions are Even More Concentrated in the Downtown Area

Trip Attractions are Even More Concentrated in the Downtown Area

Assessing Average Travel Time by Trip Purpose (based on NHTS data)

By summarizing the NHTS trips by purpose we found that Home-Based Work trips had the longest average travel time of 26.45 minutes, while both Home-Based Other and Non-Home-Based trips were significantly shorter with average times of 17.15 and 16.33 minutes respectively. These estimates aligned with our expectations that residents are typically more willing to travel farther to their jobs than for other day-to-day purposes.

Determining Travel Flows Using an Exponential Deterrence Function

Based on the observed values from NHTS data, we calibrated our exponential deterrence function to best fit those travel times. We decided to use an exponential deterrence function in keeping with our exponential accessibility decay , which reflects the same assumed value of travel time. The exponential function is written as: \[\begin{equation} F_{ijp} = e^{-mt_{ij}} \end{equation}\] where \(F_{ij}\) is the friction factor for trips with purpose p between zone i and zone j, \(t_{ij}\) is the the travel time from zone i to zone j.

We adjusted the m value for each purpose, to most closely correspond with the average travel time found in NTHS data. For Home-Based Work trips we used a value -0.06, for Home-Based Other trips we used a value of -0.18, and for Non-Home-Based trips we used a value of -0.2. The resulting estimates in our model were all within .2 minutes of the NHTS averages. Though still relatively low, the higher magnitude m values for home-based other and non-home-based trips suggest that those trip types are more sensitive to travel time in our model. This also matches our expectations that home-based work trips are the least sensitive to travel time, as they are the least “negotiable” trips a person typically takes in a given day.

purpose nhts_avg_time pred_time time_dif
HBO 17.15898 17.00236 0.1566160
HBW 26.45428 26.52832 -0.0740393
NHB 16.33240 16.13501 0.1973930

Visualizing Travel Flows by Purpose and County

To better understand trip distribution within the MSA, we mapped the most frequent origin-destination pairs (those with 250 trips or more between them) by trip type. We noticed that Home-Based Other trips are the most spatially distributed across the region, while Home-Based Work trips were the most concentrated — particularly around what we believe to be the major employment centers just south of Oklahoma City.

Home-Based Other Flows

Home-Based Work Flows

Non-Home Based Flows

We also created three chord diagrams showing the intra- and inter-county flows by trip purpose. This allows us to better understand which counties are more attractive or productive for each trip purpose. For example, we noticed that in Cleveland County (southeast of downtown), home-based other and non-home-based trips are more than half of all trips into the county, while home-based work trips are a smaller share. For all trip types with destinations in Cleveland County, Oklahoma County (centrally located) was a major producer, providing yet another illustration of the close relationship between these two central areas in the MSA.

Home-Based Other Trips by County

We also created three chord diagrams showing the intra- and inter-county flows by trip purpose (Figure @ref(fig:hbo-chord), @ref(fig:hbw-chord), @ref(fig:nhb-chord)). This allows us to better understand which counties are more attractive or productive for each trip purpose. For example, we noticed that in Cleveland County (southeast of downtown), home-based other and non-home-based trips are more than half of all trips into the county, while home-based work trips are a smaller share. For all trip types with destinations in Cleveland County, Oklahoma County (centrally located) was a major producer, providing yet another illustration of the close relationship between these two central areas in the MSA.

Intra- and Inter-County Flows for Home-Based Other Trips

Home-Based Work Trips by County

Intra- and Inter-County Flows for Home-Based Work Trips

Non-Home Based Trips by County

Intra- and Inter-County Flows for Non-Home-Based Trips

Mode Choice

Calculating Travel Cost

To calculate the driving cost of each trip, we estimated the straight-line distance between the origin and destination centroids, converted that distance to miles, and then used the 2023 IRS mileage reimbursement rate (0.655 dollars) to calculate the cost. The most expensive trip is about $60 to cover 91.4 miles, while the shortest trip is about 0.13 dollars to cover 0.2 miles.

We acknowledge that this is a rough approximation because it uses “as the crow flies” distance, rather than the shortest path traveled along the roadway between two centroids. We decided to take this approach because we believe that the IRS mileage rate more fully captures the cost of driving than using NHTS data to approximate cost by minute based on fuel expenses, even with this distance limitation. Equally, in the future, this method could be updated using more accurate distances along the road network.

For HOV trips, we established carpool costs using the “Carpool 2+ only—daily” assumptions from NCHRP 716, which divides the previously calculated driving cost by an average number of additional passengers.

For cost per transit trip, we had already included a transit fare when we initially created our network and skim. As an estimated average of the Embark system’s different fares, we assigned each trip a $2.00 fare with free transfers.

Mode-by-Purpose Table

In setting up the mode-by-purpose table, we noted the overwhelming share of home-based work trips that rely on single-occupancy vehicles (90 percent).

purpose pct_SOV pct_HOV pct_transit
HBO 0.4297638 0.5415895 0.0286467
HBW 0.9053029 0.0715927 0.0231044
NHB 0.5104273 0.4771184 0.0124543

Model Coefficients

When choosing our model coefficients for each trip purpose, we opted not to choose a nested logit model to maintain consistency across all models. At the outset, we did not think it would fundamentally make a difference in our calculations because we posit that people treat driving alone and in a shared car as a distinct choice. Ultimately, though, we do acknowledge that this choice overweights the probability of people driving by assuming that travelers view driving alone and driving with someone else as entirely distinct choices from one another, providing them with two auto-based options to the single transit option.

For all three trip purposes, we selected a model based on its ability to appropriately fit our study area in the following ways:

  • More than 1 million people
  • Excluding non-motorized modes of travel (walking and biking)
  • Including auto submodes (driving alone and sharing a ride)
  • Excluding transit submodes

For home-based other trips, we selected model G. For home-based work trips, we selected model B. And for non-home-based trips, we selected model D. All model parameters and values can be found in NCHRP 716 (tables 4.7 – 4.14).

Calibration

Through trial and error, we learned the nuances of balancing out coefficients across the different modes within each trip type. We aimed to be within a 5 percentage point threshold from the observed mode share by purpose data from the NHTS.

As initial values, all constants were set to the log odds of that particular mode for that purpose from the NHTS. If the resulting predicted value was above or below the accuracy threshold, the values were adjusted manually until the result was satisfactorily close.

HBO: For SOV kept the coefficient at -0.28; we changed the coefficient for HOV from 0.167 to -0.25; and for transit we kept the coefficient at -3.52. HBW: For SOV we went from 2.26 to 0.70; HOV -2.56 to -2.00; transit -3.74 to -1.00 NHB: All mode shares were within 2 percentage points of the NHTS data shares, so we did not change the coefficients.

Comparison Between NHTS Data and Model 1

purpose pct_SOV pct_HOV pct_transit
HBO 0.4297638 0.5415895 0.0286467
HBO_model_1 0.3703460 0.6293185 0.0002652
HBW 0.9053029 0.0715927 0.0231044
HBW_model_1 0.9964629 0.0035267 0.0000000
NHB 0.5104273 0.4771184 0.0124543
NHB_model_1 0.5098686 0.4900094 0.0000343

Comparison Between NHTS Data and Final Model

purpose pct_SOV pct_HOV pct_transit
HBO 0.4297638 0.5415895 0.0286467
HBO_model_1 0.3703460 0.6293185 0.0002652
HBO_model_2 0.4707985 0.5287810 0.0003479
HBW 0.9053029 0.0715927 0.0231044
HBW_model_1 0.9964629 0.0035267 0.0000000
HBW_model_2 0.9425489 0.0564987 0.0007734
NHB 0.5104273 0.4771184 0.0124543
NHB_model_1 0.5098686 0.4900094 0.0000343
NHB_model_2 0.5098686 0.4900094 0.0000343

Visualizations

In order to understand the relationship between trip length and mode choice, we plotted the percent of trips using a particular mode (as predicted by our model) against the drive time of that trip (Figure @ref(fig:mode-split-all)). We then further subset the data by purpose to illustrate the difference in mode share for different kinds of trips (Figure @ref(fig:mode-split-hbo), @ref(fig:mode-split-hbw), @ref(fig:mode-split-nhb)).

We used drive time as our preferred metric of travel time based on the same assumption we used in the trip generation and distribution calculation — that it most often will be the shortest trip between two zones. Because there are over 130,000 individual origin-destination pairs in our matrix, we decided to create binned scatter plots which more clearly present the trend by reducing the visual noise. The bins are set at roughly 5-minute intervals in drive time, which was done by taking the max drive time of approximately 425 minutes and dividing it by 5. Within each of these bins, the mean share of each mode for all trips is calculated and presented.

While we think this approach makes the data more understandable, we acknowledge it comes with some limitations. First, because transit ridership overall is very low and there are many trips for which transit is not an option at all, taking the mean transit share results in a value of nearly 0. While this does reflect a general lack of transit use, it may also falsely give the impression that “no-one rides transit” when there are some origin-destination pairs where transit is significantly more common than others. Also, because a mean value is heavily affected by outliers, it can be a less useful or representative metric for small sample sizes. This comes up in particular for the bins of drive times above 90 minutes for which the number of trips is significantly smaller than, for example, the 5-10 minute bin.

We notice that at trips of about 45 minutes there is a slight bump in the share of HOV trips and a slight dip in the share of SOV trips. This happens again for trips of about 90 minutes. This plot suggests that it is unlikely for people to carpool for trips of 2+ hours, across all trip purposes. We also note that the bumps happen right around 15-minute intervals, suggesting an effect from people’s tendency to round time into more precise intervals.

We initially mapped this information to understand the spatial distribution of mode choice throughout our study area, too. However, it seemed repetitive of these scatter plots and did not contribute any new interpretations of our analysis. In particular, because the SOV and HOV mode shares are both approximately 40-55% of mode share for all trips, there was little range across the geography. We wonder if this lack of differentiation is a symptom of not having chosen a nested logit model.

Mode Split by Drive Time for all Trip Purposes

Mode Split by Drive Time for all Trip Purposes

Mode Split by Drive Time for Home-Based Other Trips

Mode Split by Drive Time for Home-Based Other Trips

Mode Split by Drive Time for Home-Based Work Trips

Mode Split by Drive Time for Home-Based Work Trips

Mode Split by Drive Time for Non-Home-Based Trips

Mode Split by Drive Time for Non-Home-Based Trips

Trip Assignment

Oklahoma County has the Most Employees in the MSA, and the Majority Are Service Sector Workers

Oklahoma County has the Most Employees in the MSA, and the Majority Are Service Sector Workers

TKTK

TKTK

TKTK

TKTK

TKTK

TKTK

TKTK

TKTK

TKTK

TKTK

## [1] 24150645
## [1] 2584

Technical appendix

Data dictionary

Overview of data structures

retail_emp inc_btw_15k_20kE inc_gt_200kE inc_btw_150k_200kE inc_btw_75k_100kE service_emp inc_btw_40k_45kE inc_btw_50k_60kE inc_btw_125k_150kE inc_btw_100k_125kE hh_povlevelE inc_btw_35k_40kE land_area_sqmeters total_hhsE inc_btw_30k_35kE hh_1personE inc_btw_45k_50kE inc_btw_60k_75kE hh_u18_singleparent_maleE tot_popE inc_btw_20k_25kE hh_3personE hh_4person_plusE hh_u18_married_coupleE tot_disabledE GEOID NAME no_vehE inc_btw_25k_30kE hh_2personE hh_u18_singleparent_femaleE total_emp hh_65plusE inc_lt_10kE basic_emp inc_btw_10k_15kE pop_density emp_density activity_density pct_veh hhs_with_kidsE med_incE owner_occE renter_occE geometry hhs_without_kidsE
16 65 59 68 389 102 139 163 51 175 83 38 9951952 1540 41 334 94 143 51 4371 25 167 530 1028 4293 40027202007 Census Tract 2020.07, Cleveland County, Oklahoma 6 10 509 169 613 270 0 495 80 0.0004392 0.0000616 0.0005008 0.9961039 584 75657 1249 291 POLYGON ((-97.47698 35.3773… 956
26 66 0 0 45 236 5 125 7 0 115 27 1295991 702 90 199 14 105 54 2043 40 157 215 302 2043 40109107217 Census Tract 1072.17, Oklahoma County, Oklahoma 22 66 131 265 291 202 39 29 73 0.0015764 0.0002245 0.0018009 0.9686610 358 32215 446 256 POLYGON ((-97.54794 35.4207… 344
53 0 0 0 0 540 0 0 0 0 4 0 988281 8 0 0 0 0 0 581 0 0 0 0 581 40109103602 Census Tract 1036.02, Oklahoma County, Oklahoma 0 4 8 0 988 4 0 395 4 0.0005879 0.0009997 0.0015876 1.0000000 0 72935 4 4 POLYGON ((-97.53013 35.4632… 8
148 64 65 42 255 484 40 230 71 143 55 71 6203874 2019 101 795 39 457 8 4021 60 255 175 320 3955 40027202002 Census Tract 2020.02, Cleveland County, Oklahoma 0 156 794 136 1138 551 84 506 141 0.0006481 0.0001834 0.0008316 1.0000000 437 70633 900 1119 POLYGON ((-97.52126 35.3771… 1582
514 36 80 63 117 1050 64 124 79 93 22 131 6752251 1370 55 534 24 158 6 3238 97 164 236 505 3113 40027201508 Census Tract 2015.08, Cleveland County, Oklahoma 9 171 436 219 1612 340 70 48 8 0.0004795 0.0002387 0.0007183 0.9934307 406 63879 646 724 POLYGON ((-97.51263 35.2038… 964
14 9 200 256 240 566 104 128 250 347 35 10 10308637 1990 75 202 45 155 168 6832 19 384 705 1467 6832 40017301008 Census Tract 3010.08, Canadian County, Oklahoma 0 71 699 27 796 510 71 216 10 0.0006627 0.0000772 0.0007400 1.0000000 867 112361 1771 219 POLYGON ((-97.70691 35.4496… 1123
from_GEOID to_GEOID drive_time total_time_mins num_transfers access_wt init_wt ivtt transfer_wt egress_wt ovtt perceived_time fare drive_miles drive_cost carpool_cost_hbo carpool_cost_hbw carpool_cost_nhb HBO_flow HBW_flow NHB_flow utility_transit_HBO utility_SOV_HBO utility_HOV_HBO exp_u_SOV_HBO exp_u_HOV_HBO exp_u_transit_HBO total_utility_HBO utility_transit_HBW utility_SOV_HBW utility_HOV_HBW exp_u_SOV_HBW exp_u_HOV_HBW exp_u_transit_HBW total_utility_HBW utility_transit_NHB utility_SOV_NHB utility_HOV_NHB exp_u_SOV_NHB exp_u_HOV_NHB exp_u_transit_NHB total_utility_NHB p_transit_HBO p_SOV_HBO p_HOV_HBO p_transit_HBW p_SOV_HBW p_HOV_HBW p_transit_NHB p_SOV_NHB p_HOV_NHB n_transit_HBO n_SOV_HBO n_HOV_HBO n_transit_HBW n_SOV_HBW n_HOV_HBW n_transit_NHB n_SOV_NHB n_HOV_NHB n_total_total n_transit_total n_SOV_total n_HOV_total n_total_HBO n_total_HBW n_total_NHB
40027202007 40109107217 19.17474 NA NA NA NA NA NA NA NA NA NA 6.302551 4.128171 1.5233103 1.7058558 1.5011531 17 3 8 NA -0.5942791 -0.4859234 0.5519603 0.6151289 NA 1.1670892 NA 0.1070066 -2.582577 1.1129416 0.0755790 NA 1.1885206 NA -0.3054364 -0.3520506 0.7368018 0.7032445 NA 1.4400463 0 0.4729375 0.5270625 0 0.9364092 0.0635908 0 0.5116515 0.4883485 0 8 9 0 3 0 0 4 4 28 0 15 13 17 3 8
40027202007 40109103602 19.97458 NA NA NA NA NA NA NA NA NA NA 7.975788 5.224141 1.9277274 2.1587360 1.8996877 14 7 7 NA -0.6340607 -0.5056499 0.5304335 0.6031135 NA 1.1335470 NA 0.0782988 -2.608520 1.0814457 0.0736435 NA 1.1550892 NA -0.3504016 -0.3740005 0.7044051 0.6879766 NA 1.3923817 0 0.4679413 0.5320587 0 0.9362444 0.0637556 0 0.5058994 0.4941006 0 7 7 0 7 0 0 4 3 28 0 18 10 14 7 7
40027202007 40027202002 13.36041 NA NA NA NA NA NA NA NA NA NA 2.792722 1.829233 0.6749936 0.7558813 0.6651755 131 15 70 NA -0.4694666 -0.4031789 0.6253357 0.6681925 NA 1.2935282 NA 0.2913219 -2.404063 1.3381953 0.0903501 NA 1.4285454 NA -0.1656138 -0.2605058 0.8473734 0.7706617 NA 1.6180351 0 0.4834341 0.5165659 0 0.9367537 0.0632463 0 0.5237052 0.4762948 0 63 68 0 14 1 0 37 33 216 0 114 102 131 15 70
40027202007 40027201508 25.63375 NA NA NA NA NA NA NA NA NA NA 11.331620 7.422211 2.7388234 3.0670295 2.6989859 14 14 3 NA -0.7543964 -0.5857634 0.4702944 0.5566807 NA 1.0269752 NA -0.1009280 -2.782201 0.9039981 0.0619021 NA 0.9659003 NA -0.4851888 -0.4626282 0.6155810 0.6296267 NA 1.2452076 0 0.4579414 0.5420586 0 0.9359125 0.0640875 0 0.4943601 0.5056399 0 6 8 0 13 1 0 1 2 31 0 20 11 14 14 3
40027202007 40017301008 31.53315 NA NA NA NA NA NA NA NA NA NA 14.361104 9.406523 3.4710417 3.8869930 3.4205538 7 4 3 NA -0.8709355 -0.6659917 0.4185598 0.5137638 NA 0.9323236 NA -0.2864426 -2.962709 0.7509302 0.0516787 NA 0.8026089 NA -0.6155645 -0.5513334 0.5403358 0.5761810 NA 1.1165168 0 0.4489427 0.5510573 0 0.9356115 0.0643885 0 0.4839477 0.5160523 0 3 4 0 4 0 0 1 2 14 0 8 6 7 4 3
40027202007 40109108221 42.48836 NA NA NA NA NA NA NA NA NA NA 22.623607 14.818463 5.4680673 6.1233316 5.3885318 1 1 0 NA -1.1374338 -0.8334575 0.3206408 0.4345442 NA 0.7551850 NA -0.6383701 -3.300981 0.5281525 0.0368470 NA 0.5649995 NA -0.9146658 -0.7367839 0.4006505 0.4786508 NA 0.8793013 0 0.4245858 0.5754142 0 0.9347840 0.0652160 0 0.4556464 0.5443536 0 0 1 0 1 0 0 0 0 2 0 1 1 1 1 0

Definition of Variables

GEOID - Numerical ID for census tract
NAME - Census tract, county, and state
tot_popE - Total number of people
no_vehE - Households with no vehicle present pct_veh - Percent of households with at least one vehicle total_hhsE - Total number of households
hh_1personE - Number of households with 1 person
hh_2personE - Number of households with 2 persons
hh_3personE - Number of households with 3 persons
hh_4person_plusE - Number of households with 4 or more persons
hh_u18_married_coupleE - Number of households with a person under 18 and a married couple head-of-household
hh_u18_singleparent_maleE - Number of households with a person under 18 and a single male head-of-household
hh_u18_singleparent_femaleE - Number of households with a person under 18 and a single female head-of-household
hh_65plusE - Number of households with a person age 65 or older
tot_disabledE - Number of people with a disability
inc_lt_10kE - Number of households with income less than $10,000
inc_btw_10k_15kE - Households with income between $10,000 and $15,000
inc_btw_15k_20kE - Households with income between $15,000 and $20,000
inc_btw_20k_25kE - Households with income between $20,000 and $25,000
inc_btw_25k_30kE - Households with income between $25,000 and $30,000
inc_btw_30k_35kE - Households with income between $30,000 and $35,000
inc_btw_35k_40kE - Households with income between $35,000 and $40,000
inc_btw_40k_45kE - Households with income between $40,000 and $45,000
inc_btw_45k_50kE - Households with income between $45,000 and $50,000
inc_btw_50k_60kE - Households with income between $50,000 and $60,000
inc_btw_60k_75kE - Households with income between $60,000 and $75,000
inc_btw_75k_100kE - Households with income between $75,000 and $100,000
inc_btw_100k_125kE - Households with income between $100,000 and $125,000
inc_btw_125k_150kE - Households with income between $125,000 and $150,000
inc_btw_150k_200kE - Households with income between $150,000 and 200,000
inc_gt_200kE - Households with income greater than $200,000
hh_povlevelE - Number of households below the poverty level
total_emp - Total number of people employed
basic_emp - Total number of people employed in the following sectors: Agriculture, Forestry, Fishing, and Hunting (CNS01) Mining and extraction (CNS02) Utilities (CNS03) Construction (CNS04) Manufacturing (CNS05) Wholesale trade (CNS06) Transportation and warehousing (CNS06)
retail_emp - Total number of people employed in retail
service_emp - Total number of people employed in remaining sectors
land_area_sqmeters - Land area in square meters
geometry - Geographic coordinates of census tract outlines
pop_density - People per square meter (calculated)
emp_density - Employees per square meter (calculated)
activity_density - People and employees per square meter (calculated) from_GEOID - Trip origin to_GEOID - Trip destination drive_time - Total length of car trip in minutes total_time_mins - Total length of transit trip in minutes num_transfers - Number of transfers per transit trip access_wt - Walk time to access transit init_wt - Initial wait time for transit trip ivtt - In-vehicle travel time for transit trip transfer_wt - Wait time for transfers, set at a maximum of 15 minutes egress_wt - Walk time to destination after transit trip ovtt - Out-of-vehicle travel time for transit trip perceived_time - Weighted travel time for transit trip (ovtt + 1.5*ivtt) fare - Total cost of transit trip, assuming fares are $2 and transfers are free (based on generalization of EMBARK fares)

Code

Stacked Bar Generator

transit_stackedbar_generator <- function(iso_origin){ temp_matrix <- full_travel_matrix %>% filter(from_GEOID == iso_origin) %>% select(to_GEOID, access_wt, init_wt, ivtt, transfer_wt, egress_wt)

temp_matrix_stacked <- melt(temp_matrix, id = “to_GEOID”)

ggplot(temp_matrix_stacked, aes(x = reorder(to_GEOID, value), y = value, fill = variable)) + geom_bar(stat = “identity”) + scale_fill_manual(values = c(“steelblue”,“thistle”, “gold”, “red”, “green”), label = c(“Walk time”, “Wait time”, “Travel time”, “Transfer time”, “Egress time”)) + scale_y_continuous(labels = scales::comma) + labs(y = “Number of Minutes”, x = ” “, fill =”Public Transportation Activity”, title = paste(“Transit Time to all zones from”, iso_origin, “(minutes)”)) + labs(subtitle = “Figs. 15 and 16 - Travel Time Makes Up the Bulk of Most Trips,Transfer Wait Time is Also A Large Portion of Trip Duration”) + theme(legend.position = “top”, axis.text.x=element_blank(), axis.ticks.x=element_blank()) }

transit_stackedbar_generator(x)

Isochrone Generator

drive_isochrone_generator <- function(iso_origin){ temp_matrix <- full_travel_matrix %>% filter(from_GEOID == iso_origin)

#get tract geometry from ok_msa iso_drivetime_from_origin <- full_tract_info %>% select(GEOID, geometry) %>% full_join(temp_matrix, by = c(“GEOID” = “to_GEOID”))

selected_centroid <- full_centroid_geom %>% filter(GEOID == iso_origin)

#change NA to 0 iso_drivetime_from_origin[is.na(iso_drivetime_from_origin)] <- 0

ggplot(iso_drivetime_from_origin) + annotation_map_tile(zoomin = 0, progress = “none”, type = “cartolight”) + geom_sf(aes(fill = drive_time)) + scale_fill_gradient2(low = “white”, mid = “thistle”, high = “orchid4”, midpoint = 50, #median(iso_drivetime_from_origin$travel_time_mins), name = paste(“Drive Timeallfrom”, iso_origin, “(minutes)”)) + labs(subtitle = “Figs. 9, 10, and 11 - Drive-time Isochrones Show Access to All Zones in the Study Area”) + geom_sf(data = selected_centroid, shape = 19, size = 1.5) + annotation_scale(location = ‘br’, text_cex = 0.5) + theme() }

transit_isochrone_generator <- function(iso_origin){ temp_matrix <- full_travel_matrix %>% filter(from_GEOID == iso_origin)

#get tract geometry from ok_msa iso_transittime_from_origin <- full_tract_info %>% select(GEOID, geometry) %>% full_join(temp_matrix, by = c(“GEOID” = “to_GEOID”))

selected_centroid <- full_centroid_geom %>% filter(GEOID == iso_origin)

ggplot(iso_transittime_from_origin) + annotation_map_tile(zoomin = 0, progress = “none”, type = “cartolight”) + geom_sf(aes(fill = total_time_mins)) + scale_fill_gradient2(low = “white”, mid = “thistle”, high = “orchid4”, midpoint = 50, #median(iso_drivetime_from_origin$travel_time_mins), name = paste(“Transit Timeallfrom”, iso_origin, “(minutes)”)) + labs(subtitle = “Figs. 12, 13, and 14 - Transit-time Isochrones Show Limited Access within the Study Area”) + geom_sf(data = selected_centroid, shape = 19, size = 1.5) + annotation_scale(location = ‘br’, text_cex = 0.5) + theme() } drive_isochrone_generator(x) transit_isochrone_generator(x)

Cumulative Accessibility

drive_accessibility_matrix <- full_travel_matrix %>% filter(drive_time<30) %>% group_by(from_GEOID) %>% tally() %>% rename(score = n)

drive_accessibility_geom <- full_tract_info %>% select(GEOID, geometry) %>% left_join(drive_accessibility_matrix, by = c(“GEOID” = “from_GEOID”))

ggplot(drive_accessibility_geom) + annotation_map_tile(zoomin = 0, progress = “none”, type = “cartolight”) + geom_sf(aes(fill=score)) + labs(subtitle = “Fig. 8 - Zonal-Based Accessibility Shows Greatest Access to Other ZonesCentered in Downtown Oklahoma City”) + annotation_scale(location = ‘br’, text_cex = 0.5) + theme()

Gravity-based Accessibility Functions

a7397251799aa95fd9aecdcd2a5f54203021da4b accessibility_summary <- accessibility_summary %>% left_join(full_tract_info %>% select(GEOID, geometry), by = c(“from_GEOID” = “GEOID”)) %>% as.data.frame()

tract_accessibility <- full_tract_info %>% select(GEOID, geometry) %>% left_join(accessibility_summary, by = c(“GEOID” = “from_GEOID”))

ggplot(tract_accessibility) + annotation_map_tile(type = “cartolight”, zoomin = 0, progress = “none”) + geom_sf(aes(fill = drive_index), color = NA, alpha = 0.6) + scale_fill_viridis_c(name = “Drive accessibility”, #trans = “log”, #breaks = c(0.000001,100), #labels = c(“Low”, #“High”), option = “A”) + labs(subtitle = “Fig. 17 - Accessibility by Car is Strongest in Downtown Oklahoma City,Some Zones on the Periphery Indicating Very Low Accessibility”) + theme()

ggplot(tract_accessibility) + annotation_map_tile(type = “cartolight”, zoomin = 0, progress = “none”) + geom_sf(aes(fill = transit_index), color = NA, alpha = 0.6) + scale_fill_viridis_c(name = “Transit accessibility”, trans = “log”, breaks = c(0.000001,100), #labels = c(“Low”, #“High”), option = “A”) + labs(subtitle = “Fig. 18 - Accessibility by Transit is Concentrated in the Downtown Area,Some Accessibility around a Southern Transit Hub in Norman”) + theme()

References

  • Agrawal et. al. “Getting around When You’re Just Getting by : The Travel Behavior and Transportation Expenditures of Low-Income Adults.” Welcome to ROSA P. Mineta Transportation Institute, January 1, 2011. https://rosap.ntl.bts.gov/view/dot/18612.
  • Rosenbloom, Sandra. “The Impact of Growing Children on Their Parents’ Travel Behavior: A Comparative Analysis .” Transportation Research Board, 1987. https://onlinepubs.trb.org/Onlinepubs/trr/1987/1135/1135-003.pdf.
  • Boschmann E.E., Brady S. Travel behaviors, sustainable mobility, and transit-oriented developments: A travel count analysis of older adults in the Denver, Colorado metropolitan area. J. Transp. Geogr. 2013;33:1–11. doi: 10.1016/j.jtrangeo.2013.09.001.